The Myriad Virtues of Subword Trees
نویسندگان
چکیده
Several nontrivial applications of subword trees have been developed since their first appearance. Some stich applications depart considerably from the original motivations. A brief account of them is attempted here. INTRODUcnON Subword trees fit in the general subject of digitaEearch indexes [KNj. In fact their earliest conception is somewhat implicit in Morrison"s 'PATRICIA' tries [MOl. Several linear time and space subword tree constructions arc available today [Me, PR, SLj (see also [AH]), following the pioneering work by Weiner [WE]. More compact alternate versions have been introduced recently in [BL, BE, CS2]. The data structures developed in this endeavor are variously referred to as B-trees. position ,trees, suffix (or prefix) trees, subword trees, repetition finders, directed acyclic word graphs,etc. A concise account of the similarities and discrepancies among the various approaches is presented in [SEl, CSl]. On line (though not linear time) constructions are discussed in [MR.]. In this paper. we choose to refer mostly to the ver.. slon in [Me], to which we also conform as much as possible as for basic definitions and notations. HOWCYCI. thc plOPClties presented hCTe-are-tO a Iarge-exteDt indepe-n"--------dent of the particular incarnation of a subword tree, and, from the conceptual standpoint, so are indeed the associated criteria and constructions. This paper addresses itself to a reader with scarce previous exposure to the subject, but it does assume some familiarity with elementary facts and concepts in combinatorics on words. The paper is also self-contained in the description of the various applications presented. However, some proofs are only sketched; the reader is also pointed to the referenced literature when it comes to constructions too elaborate to be given here in full details. Finally, the list given here is not meant to be exhaustive. In particular, it reflects some recent involvements of this author, and his personal perspective. The paper is organized as follows. Basic properties and applications of subword trees are outlined in the next section. In Section 2, such trees are treated as a unify.. ing framework for the description of a class of linear time sequential data compression techniques that is becoming increasingly popular. In Section 3, we take steps from one such data compression paradigm and use subword trees to decide whether a word contains a square subword. in linear time. We show next how subword trees can be used also to spot all such squares, as well as to establish bounds on the number of cube subwords in a string. Augmented subword trees are suited to allocate the statistics without overlap of all subwords of a tcxtstring, as highlighted in Section 4. In Section 5, we mention two applications in which subword trees are outperformed by other approaches.
منابع مشابه
A speed-up for the commute between subword trees and DAWGs
A popular way to describe and build the DAWG or Directed Acyclic Word Graph of a string is by transformation of the corresponding subword tree. This transformation, which is not difficult to reverse, is easy to grasp and almost trivial to implement except for the assumed implication of a standard tree isomorphism algorithm. Here we point out a simple property of subword trees that makes checkin...
متن کاملSome properties of a class of polyhedral semigroups based upon the subword reversing method
In this paper a certain class of polyhedral semigroups which has a presentation $$ is examined. The completeness of the presentation and solvability of word problem of this class of semigroups is determined. Moreover the combinatorial distance between two words is determined.
متن کاملThe Myriad Virtues of Wavelet Trees
Article history: Received 20 March 2008 Revised 17 November 2008 Available online 29 January 2009 Wavelet Trees have been introduced by Grossi et al. in SODA 2003 and have been rapidly recognized as a very flexible tool for the design of compressed full-text indexes and data compression algorithms. Although several papers have investigated the properties and usefulness of this data structure in...
متن کاملEL-labelings and canonical spanning trees for subword complexes
SUBWORD COMPLEXES (W,S) finite Coxeter system, Q = q1q2 · · · qm ∈ S∗, and ρ ∈W . Subword complex SC(Q, ρ) = simplicial complex with • vertices = [m] = positions in Q, • facets =F(Q, ρ) = complements of reduced expressions of ρ in Q. Exm. Q = τ2τ3τ1τ3τ2τ1τ2τ3τ1 in (S4, {(i i+ 1)}) ρ = [4, 1, 3, 2] = τ2τ3τ2τ1 = τ3τ2τ3τ1 = τ3τ2τ1τ3 F(Q, ρ) = {1, 2, 3, 5, 6}, {1, 2, 3, 6, 7}, {1, 2, 3, 7, 9}, {1, ...
متن کاملThe Structure of Subword Graphs and Suffix Trees of Fibonacci Words
We use automata-theoretic approach to analyze properties of Fibonacci words. The directed acyclic subword graph (dawg) is a useful deterministic automaton accepting all suffixes of the word. We show that dawg’s of Fibonacci words have particularly simple structure. Our main result is a unifying framework for a large collection of relatively simple properties of Fibonacci words. The simple struc...
متن کامل